Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

v1.15: Remove recursive read lock that could deadlock Blockstore (backport of #30203) #30300

Merged
merged 1 commit into from
Feb 14, 2023

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Feb 13, 2023

This is an automatic backport of pull request #30203 done by Mergify.


Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

  • @Mergifyio refresh will re-evaluate the rules
  • @Mergifyio rebase will rebase this PR on its base branch
  • @Mergifyio update will merge the base branch into this PR
  • @Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

  • look at your merge queues
  • generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com

This deadlock could only occur on nodes that call
Blockstore::get_rooted_block(). Regular validators don't call this
function, RPC nodes and nodes that have BigTableUploadService enabled
do.

Blockstore::get_rooted_block() grabs a read lock on lowest_cleanup_slot
right at the start to check if the block has been cleaned up, and to
ensure it doesn't get cleaned up during execution. As part of the
callstack of get_rooted_block(), Blockstore::get_completed_ranges() will
get called, which also grabs a read lock on lowest_cleanup_slot.

If LedgerCleanupService attempts to grab a write lock between the read
lock calls, we could hit a deadlock if priority is given to the write
lock request in this scenario.

This change removes the call to get the read lock in
get_completed_ranges(). The lock is only held for the scope of this
function, which is a single rocksdb read and thus not needed. This does
mean that a different error will be returned if the requested slot was
below lowest_cleanup_slot. Previously, a BlockstoreError::SlotCleanedUp
would have been thrown; the rocksdb error will be bubbled up now. Note
that callers of get_rooted_block() will still get the SlotCleanedUp
error when appropriate because get_rooted_block() grabs the lock. If the
slot is unavailable, it will return immediately. If the slot is
available, get_rooted_block() holding the lock means the slot will
remain available.

(cherry picked from commit 328b674)
@steviez
Copy link
Contributor

steviez commented Feb 13, 2023

I marked this for backport to v1.15 given that the issue should correspond to the Rust change of behavior that came with Rust 1.62. v1.15 is our first branch with >= 1.62; v1.14 is on 1.60 so no need to backport there.

@CriesofCarrots
Copy link
Contributor

Yay, thanks for the comment!

@steviez steviez merged commit 834cf7d into v1.15 Feb 14, 2023
@steviez steviez deleted the mergify/bp/v1.15/pr-30203 branch February 14, 2023 04:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants